External models (Gemini Nano Banana & OpenAI GPT Image) (#8633) by CypherNaught-0x · Pull Request #8884 · invoke-ai/InvokeAI

CypherNaught-0x · 2026-02-17T11:29:59Z

Summary

This PR adds support for external model provider APIs with Google and OpenAI added for now.
It supports txt2img, img2img and image references.
I tried to make it fit well within the application and be easily extensible for future models.

Related Issues / Discussions

#8633 includes functionality requested here

QA Instructions

Select an external provider in the model setup dialog and add an API key.
Select a new model from the dropdown list.
...
Profit

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

Pfannkuchensack · 2026-02-19T11:42:59Z

I did some testing. Works fine (only did gemini)

A few comments

Reidentify button: The "Reidentify" button in the Model Manager should not be shown for external models.
Auto-install starter models: Auto-install should always be enabled for external starter models. When an API key is removed, the associated external models should also be removed.
Install queue status: The install queue shows "Unknown" when installing external models. This needs to display the correct model name/status.
Starter model description: The text for external starter models needs to clearly indicate that an API key is required and that usage may incur costs. (And the Starter Models are not needed if the Autoinstall is always on)
Canvas settings for external models: It is currently unclear which canvas settings are actually passed to external models. Right now all standard settings (Scheduler, Steps, CFG Scale, and everything under Advanced) are displayed, but most of these are not used by external models. We need a solution where external models can define their required settings as JSON, and the frontend renders only the relevant controls based on that definition.
External Image Generation node: The "External Image Generation" node also contains these values. The core issue is that we cannot have dynamic nodes. Instead, we should have a dedicated settings node for each external model node.

Pfannkuchensack · 2026-02-19T11:49:53Z

Some Changes that should be done:

In invokeai/app/services/external_generation/external_generation_default.py, the method _refresh_model_capabilities does:

from invokeai.app.api.dependencies import ApiDependencies
record = ApiDependencies.invoker.services.model_manager.store.get_model(request.model.key)

No other service in the codebase imports from invokeai.app.api.dependencies. All other services receive their dependencies via constructor injection through InvocationServices. This is an architectural violation that makes the service harder to test in isolation and creates a hidden coupling between the service and API layers.

In invokeai/app/services/model_install/model_install_common.py:

MODEL_SOURCE_TO_TYPE_MAP = {
    ...
    ExternalModelSource: ModelSourceType.Url,
}

ExternalModelSource is not a URL source. There is no ModelSourceType.External enum value in taxonomy.py. This means external models get recorded as Url-type sources in the database, which is semantically incorrect and could cause issues in any code that branches on source_type.

In invokeai/app/api/routers/app_info.py:

for config in (runtime_config, file_config):
    config.update_config(updates)
    for field_name, value in updates.items():
        if value is None:
            config.model_fields_set.discard(field_name)

This directly mutates the model_fields_set of the global singleton InvokeAIAppConfig, bypassing Pydantic's field-tracking internals. Concurrent requests to set_external_provider_config or reset_external_provider_config could race on this shared mutable set.

In invokeai/app/services/model_install/model_install_default.py, _register_external_model generates a deterministic key via slugify(f"{provider_id}-{provider_model_id}"). Installing the same external model twice produces the same key. While the DB layer catches this with DuplicateModelException, there is no proactive check or update-if-exists logic, resulting in an unhelpful error for the user.

In invokeai/app/api/routers/model_manager.py, list_model_records uses setattr(model, "capabilities", ...) and setattr(model, "default_settings", ...) on Pydantic model instances. Pydantic v2 models may not support direct attribute mutation without validate_assignment = True. The PR itself uses model_copy(update=...) correctly in other places (e.g., _apply_starter_overrides in external_generation_default.py), so this is inconsistent.

I think to save the API keys in the invoke.yaml is not the best choice here. This is something that the development team in https://discord.com/channels/1020123559063990373/1049495067846524939 should discuss.

lstein · 2026-02-23T04:23:22Z

@CypherNaught-0x Please see the changes requested from @Pfannkuchensack above.

CypherNaught-0x · 2026-02-23T10:21:03Z

Thanks @Pfannkuchensack for the valuable feedback!

I didn't feel comfortable disabling all the inputs as I didn't see this done elsewhere but you are right of course in that it makes no sense to show for example CFG when that property is not used.
For the other details I went with the interface of model_supports_x. We could do this for the advanced properties as well.
From a UX standpoint I am wondering if it's better to disable these properties or hide them completely.
Do you have any preference or insight on how this is handled elsewhere?

This was more of a first draft since I wasn't sure how such a large addition would be received so I haven't yet spent much time polishing things like the install queue. I was positively surprised with the feedback so I'll try and get things to a more polished state for the next review round.
I also haven't really done extensive testing on the OpenAI implementation so I will get that done also. Glad things worked for Gemini on your end though. I tested on different systems with fresh installs but it's nice to have external confirmation.

How are the discussions on the API Key storage coming along? I saw that the model marketplace can store API keys there as well so figured with a decently restricted API key this might be ok though I'd obviously also prefer at least non-plain-text storage.

Pfannkuchensack · 2026-02-23T10:35:09Z

Pfannkuchensack@3c83692 i did some work on the hiding of unneeded things in the ui. Maybe take a Look Or copy the whole Thing from there.

Pfannkuchensack · 2026-02-23T16:00:20Z

The API keys require a separate YAML file. This is better because it allows the API key to be kept separate.

lstein · 2026-02-24T02:33:25Z

The API keys require a separate YAML file. This is better because it allows the API key to be kept separate.

We need a unified place to stash user's security tokens and API keys. I just now proposed a "Token Manager" in Issue #8904 . Temporarily, you could add nano_banana_key and gpt_image_key to the InvokeAIAppConfig class in invokeai.app.services.config.config_default and stash the keys in invokeai.yaml .

@Pfannkuchensack Does this seem like a reasonable interim solution to API key storage or would it be better to have a completely separate API keys file, like ~/invokeai/api_keys.yaml?

Pfannkuchensack · 2026-02-24T14:24:03Z

I would prefer the separate file, especially since there will be another solution later, thus avoiding major changes to the invoke.yaml file.

lstein · 2026-02-27T03:18:41Z

@CypherNaught-0x I'm wondering what you see as the timetable for this? I'm thinking we'll be ready for a 6.12 release in the second week of March. Would that be targetable, or later? The release after that will likely be mid April.

CypherNaught-0x · 2026-02-27T09:22:32Z

Pfannkuchensack@3c83692 i did some work on the hiding of unneeded things in the ui. Maybe take a Look Or copy the whole Thing from there.

I had already started work on this and your implementation looks very similar so I'll try and integrate them. Also very much looking forward to Seedream support 👍

CypherNaught-0x · 2026-02-27T09:25:22Z

@lstein I've had some time to work on it. I'll try and get things into a polished state and push the changes. I believe mid march should be very much reasonable for a release target.

…rnal graph - Export imageSizeChanged from paramsSlice (required by the new ImageSize recall handler). - Emit the external graph's metadata model entry via zModelIdentifierField since ExternalApiModelConfig is not part of the AnyModelConfig union.

lstein · 2026-04-17T03:39:59Z

Thanks for the fixes. I've done some functional testing with the Gemini models and encountered a few remaining hitches.

Although you can no longer add inpaint mask layers to the canvas, when you create a new canvas it is still initialized with a default (empty) inpaint mask layer. If inpainting masking isn't supported, then there shouldn't be an inpaint mask at all.
The gemini invocation node has fields for Init Image and Mask Image. However, my understanding of the API is that Gemini doesn't support raster-based img2img or image masks. If this is so, these should be removed from the node.
Generating with the gemini node gives me the following error: Invalid JSON payload received. Unknown name \"thinkingConfig\": Cannot find field. This occurs with each of the three starter models.
The OpenAI GPT models don't accept raster images for img2img or image masks, but the DALL-E models do. Perhaps the OpenAI models should be split into two invocation nodes, one with the raster parameters and the other without?

Pfannkuchensack · 2026-04-17T10:00:24Z

https://ai.google.dev/gemini-api/docs/image-generation?hl=en#2_inpainting_semantic_masking
there is a mask feature.

I take a lot for the rest

…estrict GPT Image models to txt2img

lstein · 2026-04-18T16:16:15Z

I'm still uncertain that inpaint masks are usable with the external models.

Observations on the OpenAI models, using the node editor:

Using the OpenAI Image Generation node in either inpaint or img2img mode with any of the three GPT Image models or DALL-E3 results in ExternalProviderCapabilityError: Mode 'img2img'/'inpaint' is not supported.
inpaint mode is accepted by DALL-E2, but doesn't seem to result in any change to the init image. I tried with both a bitmap and with an image that used the alpha channel for the mask.
DALL-E2 in img2img node with an init image but no mask gives "Invalid input image - format must be in ['RGBA', "LA', 'L'] . These problems may be related to DALL-E using the alpha channel of the image as its mask. However, when I feed an RGBA init image to DALL-E2 in img2img mode, it is still complaining that it needs an RGBA image, which I find confusing.
I see no difference between putting an image into init image vs reference images.

Observations on the Gemini models, using the node editor:

The node accepts all combinations of img2img, inpaint with or without the init and mask images and does not complain. However, I can't get the mask to have any effect. Looking at the Gemini documentation regarding the mask, the inpaint mask appears to refer to a prompt semantic mask, not a bitmask or transparency channel.
I see no difference between putting an image into init image vs reference images.

If image mask-based inpainting isn't work, let's just remove the init image and mask image fields from the nodes.

The docs conflicts should disappear when I merge the new docs PR in.

- Remove img2img and inpaint modes from Gemini models (Gemini has no bitmap mask or dedicated edit API; image editing works via reference images in the UI) - Fix DALL-E 2 inpainting: convert grayscale mask to RGBA with alpha channel transparency (OpenAI expects transparent=edit area) and convert init image to RGBA when mask is present

lstein · 2026-04-19T02:05:44Z

@Pfannkuchensack Thanks for the recent fixes and I'm looking forward to getting this PR finished and merged. It's been a lot of work!

Unfortunately I haven't been able to get inpainting working with the DALL-E2 model (which as far as I can tell is the only model that uses a mask). I assign a black and white bitmap mask, but the entire image gets modified, not just the masked region. Does inpainting work in your hands?

I also have a philosophical question about whether the Gemini invocation node should even show the modes, the init image and the mask image fields. These are not supported by any Gemini model, and showing them as usable UI fields may confuse people.

Similarly, please consider whether the OpenAI invocation node should show these fields, since only the old DALL-E2 model uses them. The others are edit models.

- Remove DALL-E 2 from starter models (deprecated, shutdown May 12 2026) - Enable img2img for GPT Image 1/1.5/1-mini (supports edits endpoint) - Set Gemini models to txt2img only (no mask/edit API; editing via ref images) - Hide mode/init_image/mask_image fields on Gemini node (not usable) - Hide mask_image field on OpenAI node (no model supports inpaint)

lstein · 2026-04-19T15:32:53Z

The Gemini invocation node looks good and is ready to go.

Major comment
Reading the OpenAI documentation, it looks like the GPT and DALL-E models take an "action" setting of auto, generate or edit but no mode directly corresponding to inpaint or img2img (https://developers.openai.com/api/docs/guides/image-generation). Internally I see OpenAIProvider.generate() calls either the generate or edit API endpoint depending on whether an init image or at least one reference image is provided. In fact, there is no functional difference between the init image and the reference image.

I suggest:

Either hide the OpenAI node's mode parameter entirely , or replace it with an action of [auto, generate, edit] to follow the OpenAI API more closely. I am happy with the easier of the two solutions.
Hide the OpenAI node's init_image field. It doesn't do anything that the ref image field doesn't, and may confuse people who think it will behave as a raster layer.

Minor comment
The OpenAI invocation node no longer accepts an inpaint mask (correct decision), but still has inpaint as one of its modes. However, this is nonfunctional, so maybe it should be removed?

lstein · 2026-04-19T15:44:01Z

Another thing I have noticed when using either of the external generation nodes. When I have "Use cache" and "Save to Gallery" checked in the external generation node and hit the Invoke button multiple times, I only get an image output to the gallery on the first try. On the subsequent generations, the log indicates that the job is not queued and there is no output. However, the Invoke button continues to spin as if something were happening.

I would have expected to get multiple identical images using the cached values rather than no image.

- Hide OpenAI node's mode and init_image fields: OpenAI's API has no img2img/inpaint distinction (the edits endpoint is invoked automatically when reference images are provided). init_image is functionally identical to a reference image and was misleading users. - Default use_cache to False for external image generation nodes: external API calls are non-deterministic and incur usage costs. Cache hits returned stale image references that did not produce new gallery entries on repeat invokes.

External image generation nodes use the standard invocation cache, but returning the cached output (with stale image_name references) on cache hits resulted in no new gallery entries — the Invoke button would spin indefinitely on repeat invokes with identical parameters. Override invoke_internal so that on cache hit, the cached images are loaded and re-saved as new gallery entries. The expensive API call is still skipped (cost saving), but the user sees a new image as expected.

lstein · 2026-04-19T19:19:55Z

Almost there! I found just one more thing that I missed on the first go-rounds. In the OpenAI models, the "Remix" recall function is not restoring the advanced Quality, Background or Input Fidelity settings.

Remix recall iterates through ImageMetadataHandlers but only Gemini's temperature handler was wired up — OpenAI's quality, background, and input_fidelity were stored in image metadata but never parsed back into the params slice. Add the three missing handlers so Remix restores these settings as expected.

lstein

Looking good.

CypherNaught-0x requested review from JPPhoto, Pfannkuchensack, blessedcoolant, dunkeroni and lstein as code owners February 17, 2026 11:30

Pfannkuchensack self-assigned this Feb 19, 2026

lstein added the v6.13.x label Feb 20, 2026

lstein added this to Invoke - Community Roadmap Feb 20, 2026

lstein moved this to 6.13.x in Invoke - Community Roadmap Feb 20, 2026

CypherNaught-0x added 2 commits February 27, 2026 11:12

feat: initial external model support

19650f6

feat: support reference images for external models

74ecc46

Pfannkuchensack added 2 commits April 15, 2026 02:49

chore: prettier format ModelIdentifierFieldInputComponent

844632f

Pfannkuchensack added 5 commits April 17, 2026 12:00

Merge branch 'main' into external-models

faec050

Merge remote-tracking branch 'upstream/main' into external-models

d56905e

fix: remove unsupported thinkingConfig from Gemini image models and r…

69cf9f7

…estrict GPT Image models to txt2img

chore typegen

cada26d

chore(docs): regenerate settings.json for external provider fields

0b6f429

Pfannkuchensack and others added 3 commits April 18, 2026 22:56

Merge branch 'main' into external-models

931a70e

Merge branch 'main' into external-models

c28094b

Pfannkuchensack added 3 commits April 19, 2026 04:40

Merge branch 'main' into external-models

30e1d68

Chore typegen

c8542e7

Pfannkuchensack added 4 commits April 19, 2026 19:32

Chore typegen + ruff

398c5be

CHore ruff format

ed00059

Pfannkuchensack and others added 2 commits April 19, 2026 22:03

Merge branch 'main' into external-models

d626727

lstein self-requested a review April 20, 2026 17:02

lstein approved these changes Apr 20, 2026

View reviewed changes

lstein enabled auto-merge (squash) April 20, 2026 17:03

lstein merged commit 9deb545 into invoke-ai:main Apr 20, 2026
16 checks passed

Conversation

CypherNaught-0x commented Feb 17, 2026

Summary

Related Issues / Discussions

QA Instructions

Checklist

Uh oh!

Pfannkuchensack commented Feb 19, 2026

Uh oh!

Pfannkuchensack commented Feb 19, 2026

Some Changes that should be done:

Uh oh!

lstein commented Feb 23, 2026

Uh oh!

CypherNaught-0x commented Feb 23, 2026

Uh oh!

Pfannkuchensack commented Feb 23, 2026

Uh oh!

Pfannkuchensack commented Feb 23, 2026

Uh oh!

lstein commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pfannkuchensack commented Feb 24, 2026

Uh oh!

lstein commented Feb 27, 2026

Uh oh!

CypherNaught-0x commented Feb 27, 2026

Uh oh!

CypherNaught-0x commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lstein commented Apr 17, 2026

Uh oh!

Pfannkuchensack commented Apr 17, 2026

Uh oh!

lstein commented Apr 18, 2026

Observations on the OpenAI models, using the node editor:

Observations on the Gemini models, using the node editor:

Uh oh!

lstein commented Apr 19, 2026

Uh oh!

lstein commented Apr 19, 2026

Uh oh!

lstein commented Apr 19, 2026

Uh oh!

lstein commented Apr 19, 2026

Uh oh!

lstein left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lstein commented Feb 24, 2026 •

edited

Loading

CypherNaught-0x commented Feb 27, 2026 •

edited

Loading